A Searchable Map of PubChem

نویسندگان

  • Ruud van Deursen
  • Lorenz C. Blum
  • Jean-Louis Reymond
چکیده

The database PubChem was classified using 42 integer value descriptors of molecular structure, here called molecular quantum numbers (MQNs), which count atoms and bond types, polar groups, and topological features. Principal component analysis of the MQN data set shows that PubChem compounds occupy a partially filled elliptical cone in the (PC1,PC2,PC3) space whose axis is the first principal component PC1 (65% variability) representing molecular size, and the ellipse axes are PC2 (18% variability, representing structural flexibility) and PC3 (7% variability, representing polarity). A visual overview of PubChem is provided by color-coded representations of the (PC2,PC3) plane. The MQNs form a scalar fingerprint which can be used to measure the similarity between pairs of molecules and enable ligand-based virtual screening, as illustrated for the enrichment of bioactives from the DUD data set from PubChem. An MQN-annotated version of PubChem with an MQN-similarity search tool is available at www.gdb.unibe.ch .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PubChem's BioAssay Database

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activity data of small molecules and RNAi reagents. The mission of PubChem is to deliver free and easy access to all deposited data, and to provide intuitive data analysis tools. The PubChem BioAssay database currently contains 500,000 descriptions of assay protocols, covering 5000 protein targets, 30,000 gene targe...

متن کامل

PubChem BioAssay: 2014 update

PubChem's BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological tests of small molecules generated through high-throughput screening experiments, medicinal chemistry studies, chemical biology research and drug discovery programs. In addition, the BioAssay database contains data from high-throughput RNA interference screening aimed at identifying cri...

متن کامل

Exploring chemical space for drug discovery using the chemical universe database.

Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chemical space of small molecules. Enumeration from first principles shows that almost all small molecules (>99.9%) have never been synthesized and are still available to be prepared and tested. We discuss open access sources of molecules, the classification and representat...

متن کامل

DrugBank: a comprehensive resource for in silico drug discovery and exploration

DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains >4100 drug entries including >800 FDA approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, >14,000 protein or drug target sequences are linked to these drug entries. ...

متن کامل

Visualisation and subsets of the chemical universe database GDB-13 for virtual screening

The chemical universe database GDB-13, which enumerates 977 million organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the analysis of PubChem fragments. Two hundred and fifty-five subsets of GDB-13...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 50 11  شماره 

صفحات  -

تاریخ انتشار 2010